Overview

Dataset statistics

Number of variables13
Number of observations1623
Missing cells814
Missing cells (%)3.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory165.0 KiB
Average record size in memory104.1 B

Variable types

NUM8
CAT4
BOOL1

Reproduction

Analysis started2020-07-10 12:35:43.001078
Analysis finished2020-07-10 12:35:56.563166
Duration13.56 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

date_of_establishment has a high cardinality: 627 distinct values High cardinality
location has a high cardinality: 693 distinct values High cardinality
loc.details has a high cardinality: 180 distinct values High cardinality
location.Code is highly correlated with idHigh correlation
id is highly correlated with location.CodeHigh correlation
deposit_amount_2011 is highly correlated with headquarter and 5 other fieldsHigh correlation
headquarter is highly correlated with deposit_amount_2011 and 5 other fieldsHigh correlation
deposit_amount_2012 is highly correlated with headquarter and 5 other fieldsHigh correlation
deposit_amount_2013 is highly correlated with headquarter and 5 other fieldsHigh correlation
deposit_amount_2014 is highly correlated with headquarter and 5 other fieldsHigh correlation
deposit_amount_2015 is highly correlated with headquarter and 5 other fieldsHigh correlation
deposit_amount_2016 is highly correlated with headquarter and 5 other fieldsHigh correlation
date_of_establishment has 814 (50.2%) missing values Missing
deposit_amount_2011 is highly skewed (γ1 = 39.81107704) Skewed
deposit_amount_2012 is highly skewed (γ1 = 39.64549677) Skewed
deposit_amount_2013 is highly skewed (γ1 = 39.49573447) Skewed
deposit_amount_2014 is highly skewed (γ1 = 39.49528758) Skewed
deposit_amount_2015 is highly skewed (γ1 = 39.28702403) Skewed
deposit_amount_2016 is highly skewed (γ1 = 39.62477434) Skewed
id has unique values Unique
location.Code has unique values Unique
deposit_amount_2011 has 91 (5.6%) zeros Zeros
deposit_amount_2012 has 92 (5.7%) zeros Zeros
deposit_amount_2013 has 92 (5.7%) zeros Zeros
deposit_amount_2014 has 92 (5.7%) zeros Zeros
deposit_amount_2015 has 92 (5.7%) zeros Zeros
deposit_amount_2016 has 92 (5.7%) zeros Zeros

Variables

id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct count1623
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean812.0
Minimum1
Maximum1623
Zeros0
Zeros (%)0.0%
Memory size12.7 KiB

Quantile statistics

Minimum1
5-th percentile82.1
Q1406.5
median812
Q31217.5
95-th percentile1541.9
Maximum1623
Range1622
Interquartile range (IQR)811

Descriptive statistics

Standard deviation468.6640588
Coefficient of variation (CV)0.5771724862
Kurtosis-1.2
Mean812
Median Absolute Deviation (MAD)406
Skewness0
Sum1317876
Variance219646
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
162310.1%
 
108810.1%
 
106810.1%
 
107010.1%
 
107210.1%
 
107410.1%
 
107610.1%
 
107810.1%
 
108010.1%
 
108210.1%
 
Other values (1613)161399.4%
 
ValueCountFrequency (%) 
110.1%
 
210.1%
 
310.1%
 
410.1%
 
510.1%
 
ValueCountFrequency (%) 
162310.1%
 
162210.1%
 
162110.1%
 
162010.1%
 
161910.1%
 

headquarter
Boolean

HIGH CORRELATION

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
0
1622
1
 
1
ValueCountFrequency (%) 
0162299.9%
 
110.1%
 

location.Code
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct count1623
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1770.3148490449785
Minimum5
Maximum2870
Zeros0
Zeros (%)0.0%
Memory size12.7 KiB

Quantile statistics

Minimum5
5-th percentile209.3
Q11333
median1834
Q32391.5
95-th percentile2780.9
Maximum2870
Range2865
Interquartile range (IQR)1058.5

Descriptive statistics

Standard deviation751.636198
Coefficient of variation (CV)0.424577695
Kurtosis-0.4691148652
Mean1770.314849
Median Absolute Deviation (MAD)534
Skewness-0.582501805
Sum2873221
Variance564956.9742
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
204710.1%
 
133010.1%
 
130210.1%
 
130610.1%
 
131010.1%
 
131210.1%
 
131410.1%
 
131810.1%
 
132010.1%
 
132410.1%
 
Other values (1613)161399.4%
 
ValueCountFrequency (%) 
510.1%
 
710.1%
 
810.1%
 
910.1%
 
1010.1%
 
ValueCountFrequency (%) 
287010.1%
 
286910.1%
 
286810.1%
 
286610.1%
 
286510.1%
 

date_of_establishment
Categorical

HIGH CARDINALITY
MISSING

Distinct count627
Unique (%)77.5%
Missing814
Missing (%)50.2%
Memory size12.7 KiB
1801-01-01
 
28
1998-01-07
 
22
1920-01-01
 
9
1888-01-01
 
8
1906-01-01
 
7
Other values (622)
735
ValueCountFrequency (%) 
1801-01-01281.7%
 
1998-01-07221.4%
 
1920-01-0190.6%
 
1888-01-0180.5%
 
1906-01-0170.4%
 
1935-01-0570.4%
 
1900-01-0170.4%
 
1890-01-0160.4%
 
1926-01-0150.3%
 
1999-09-0350.3%
 
Other values (617)70543.4%
 
(Missing)81450.2%
 

Length

Max length10
Median length3
Mean length6.489217498
Min length3

location
Categorical

HIGH CARDINALITY

Distinct count693
Unique (%)42.7%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
New York City
 
78
Houston
 
67
Brooklyn
 
37
Dallas
 
33
Phoenix
 
31
Other values (688)
1377
ValueCountFrequency (%) 
New York City784.8%
 
Houston674.1%
 
Brooklyn372.3%
 
Dallas332.0%
 
Phoenix311.9%
 
Bronx311.9%
 
Tucson271.7%
 
Columbus251.5%
 
Baton Rouge241.5%
 
Austin221.4%
 
Other values (683)124876.9%
 

Length

Max length19
Median length8
Mean length8.854590265
Min length3

loc.details
Categorical

HIGH CARDINALITY

Distinct count180
Unique (%)11.1%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
Maricopa
 
84
Harris
 
77
New York
 
76
Cook
 
65
Dallas
 
60
Other values (175)
1261
ValueCountFrequency (%) 
Maricopa845.2%
 
Harris774.7%
 
New York764.7%
 
Cook654.0%
 
Dallas603.7%
 
Wayne583.6%
 
Queens412.5%
 
Kings382.3%
 
Franklin372.3%
 
Westchester362.2%
 
Other values (170)105164.8%
 

Length

Max length20
Median length7
Mean length6.964879852
Min length3

state
Categorical

Distinct count14
Unique (%)0.9%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
NY
339
TX
282
OH
252
MI
208
AZ
144
Other values (9)
398
ValueCountFrequency (%) 
NY33920.9%
 
TX28217.4%
 
OH25215.5%
 
MI20812.8%
 
AZ1448.9%
 
LA1408.6%
 
IL1197.3%
 
NJ392.4%
 
CT281.7%
 
WV261.6%
 
Other values (4)462.8%
 

Length

Max length2
Median length2
Mean length2
Min length2

deposit_amount_2011
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1520
Unique (%)93.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean836240.7791127542
Minimum0.0
Maximum949696500.0
Zeros91
Zeros (%)5.6%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q150784.75
median98578.5
Q3181356.75
95-th percentile461689.95
Maximum949696500
Range949696500
Interquartile range (IQR)130572

Descriptive statistics

Standard deviation23664391.51
Coefficient of variation (CV)28.29853805
Kurtosis1596.53699
Mean836240.7791
Median Absolute Deviation (MAD)56863.5
Skewness39.81107704
Sum1357218784
Variance5.600034254e+14
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0915.6%
 
96172.520.1%
 
6154820.1%
 
3150020.1%
 
181585.520.1%
 
8140520.1%
 
126115.520.1%
 
67957.520.1%
 
41809.520.1%
 
154339.520.1%
 
Other values (1510)151493.3%
 
ValueCountFrequency (%) 
0915.6%
 
1.510.1%
 
247.510.1%
 
2107.510.1%
 
5209.510.1%
 
ValueCountFrequency (%) 
94969650010.1%
 
6764861410.1%
 
3953458210.1%
 
3315162910.1%
 
1043003410.1%
 

deposit_amount_2012
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1527
Unique (%)94.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean984752.9593345657
Minimum0.0
Maximum1114902000.0
Zeros92
Zeros (%)5.7%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q153303.25
median103470
Q3193895.25
95-th percentile481739.85
Maximum1114902000
Range1114902000
Interquartile range (IQR)140592

Descriptive statistics

Standard deviation27822195.32
Coefficient of variation (CV)28.25296949
Kurtosis1587.10057
Mean984752.9593
Median Absolute Deviation (MAD)60190.5
Skewness39.64549677
Sum1598254053
Variance7.740745524e+14
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0925.7%
 
27235.520.1%
 
7024820.1%
 
4824920.1%
 
6663920.1%
 
6813620.1%
 
5929510.1%
 
3218710.1%
 
488785.510.1%
 
202921.510.1%
 
Other values (1517)151793.5%
 
ValueCountFrequency (%) 
0925.7%
 
23710.1%
 
2020.510.1%
 
3277.510.1%
 
4963.510.1%
 
ValueCountFrequency (%) 
111490200010.1%
 
9394647610.1%
 
5548703710.1%
 
4128180610.1%
 
15721390.510.1%
 

deposit_amount_2013
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1524
Unique (%)93.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1107470.495378928
Minimum0.0
Maximum1248682500.0
Zeros92
Zeros (%)5.7%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q157223.5
median112302
Q3207204
95-th percentile528510.3
Maximum1248682500
Range1248682500
Interquartile range (IQR)149980.5

Descriptive statistics

Standard deviation31203622.44
Coefficient of variation (CV)28.17557901
Kurtosis1578.406914
Mean1107470.495
Median Absolute Deviation (MAD)64885.5
Skewness39.49573447
Sum1797424614
Variance9.736660537e+14
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0925.7%
 
173326.520.1%
 
137647.520.1%
 
12735020.1%
 
34819.520.1%
 
4668920.1%
 
136798.520.1%
 
5369720.1%
 
77494.520.1%
 
10862710.1%
 
Other values (1514)151493.3%
 
ValueCountFrequency (%) 
0925.7%
 
24310.1%
 
192010.1%
 
413110.1%
 
5425.510.1%
 
ValueCountFrequency (%) 
124868250010.1%
 
122940568.510.1%
 
5819219110.1%
 
54329896.510.1%
 
18817225.510.1%
 

deposit_amount_2014
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1529
Unique (%)94.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1220472.520332717
Minimum0.0
Maximum1374814500.0
Zeros92
Zeros (%)5.7%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q161818.75
median120445.5
Q3226321.5
95-th percentile578735.85
Maximum1374814500
Range1374814500
Interquartile range (IQR)164502.75

Descriptive statistics

Standard deviation34354845.66
Coefficient of variation (CV)28.14880719
Kurtosis1578.512689
Mean1220472.52
Median Absolute Deviation (MAD)71122.5
Skewness39.49528758
Sum1980826900
Variance1.18025542e+15
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0925.7%
 
2711720.1%
 
13697420.1%
 
76243.520.1%
 
253951.510.1%
 
5342110.1%
 
55195.510.1%
 
163495.510.1%
 
172339.510.1%
 
10588810.1%
 
Other values (1519)151993.6%
 
ValueCountFrequency (%) 
0925.7%
 
208.510.1%
 
3394.510.1%
 
4162.510.1%
 
4657.510.1%
 
ValueCountFrequency (%) 
137481450010.1%
 
127766710.510.1%
 
78427267.510.1%
 
5872897510.1%
 
21045706.510.1%
 

deposit_amount_2015
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1525
Unique (%)94.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1388776.3086876154
Minimum0.0
Maximum1548823500.0
Zeros92
Zeros (%)5.7%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q164618.5
median127141.5
Q3238721.25
95-th percentile614772.45
Maximum1548823500
Range1548823500
Interquartile range (IQR)174102.75

Descriptive statistics

Standard deviation38776096.5
Coefficient of variation (CV)27.9210527
Kurtosis1566.640917
Mean1388776.309
Median Absolute Deviation (MAD)75238.5
Skewness39.28702403
Sum2253983949
Variance1.50358566e+15
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0925.7%
 
95983.520.1%
 
6768620.1%
 
7312220.1%
 
65986.520.1%
 
216211.520.1%
 
33680420.1%
 
127141.520.1%
 
132466.510.1%
 
17937310.1%
 
Other values (1515)151593.3%
 
ValueCountFrequency (%) 
0925.7%
 
199.510.1%
 
3805.510.1%
 
402310.1%
 
5275.510.1%
 
ValueCountFrequency (%) 
154882350010.1%
 
14748324310.1%
 
12361235410.1%
 
7065566110.1%
 
27182524.510.1%
 

deposit_amount_2016
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1526
Unique (%)94.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1412397.887245841
Minimum0.0
Maximum1604137500.0
Zeros92
Zeros (%)5.7%
Memory size12.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q169571.5
median134331
Q3258485.25
95-th percentile648135.45
Maximum1604137500
Range1604137500
Interquartile range (IQR)188913.75

Descriptive statistics

Standard deviation40037728.32
Coefficient of variation (CV)28.34734368
Kurtosis1586.037873
Mean1412397.887
Median Absolute Deviation (MAD)79812
Skewness39.62477434
Sum2292321771
Variance1.603019689e+15
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0925.7%
 
10044020.1%
 
8907620.1%
 
105454.520.1%
 
21832.520.1%
 
16002920.1%
 
6816320.1%
 
205990.510.1%
 
70015.510.1%
 
125809.510.1%
 
Other values (1516)151693.4%
 
ValueCountFrequency (%) 
0925.7%
 
17710.1%
 
304210.1%
 
3943.510.1%
 
528310.1%
 
ValueCountFrequency (%) 
160413750010.1%
 
12685355710.1%
 
9105325510.1%
 
64990429.510.1%
 
28283470.510.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

idheadquarterlocation.Codedate_of_establishmentlocationloc.detailsstatedeposit_amount_2011deposit_amount_2012deposit_amount_2013deposit_amount_2014deposit_amount_2015deposit_amount_2016
01151824-12-31ColumbusDelawareOH949696500.01.114902e+091.248682e+091.374814e+091.548824e+091.604138e+09
1207NaNScarsdaleWestchesterNY439843.54.661865e+054.886130e+054.918950e+054.916880e+055.122125e+05
23081964-09-08Great NeckNassauNY286516.53.103995e+053.246585e+053.569745e+053.512745e+053.936825e+05
3409NaNHartsdaleWestchesterNY130665.01.325505e+051.397445e+051.644885e+051.679775e+051.751580e+05
45010NaNLawrenceNassauNY258912.02.591235e+052.841195e+052.976675e+053.077970e+053.348000e+05
56014NaNMount VernonWestchesterNY220230.02.050080e+052.110170e+052.314695e+052.230605e+052.182485e+05
670171966-11-12BronxBronxNY112696.51.202580e+051.234995e+051.418070e+051.455690e+051.607490e+05
78020NaNBronxBronxNY59832.06.381900e+046.570000e+046.880050e+047.704450e+048.503950e+04
89021NaNBronxBronxNY110553.01.050735e+051.056705e+051.184190e+051.210155e+051.241340e+05
910023NaNBronxBronxNY104667.01.092240e+051.120935e+051.132050e+051.181385e+051.259670e+05

Last rows

idheadquarterlocation.Codedate_of_establishmentlocationloc.detailsstatedeposit_amount_2011deposit_amount_2012deposit_amount_2013deposit_amount_2014deposit_amount_2015deposit_amount_2016
1613161402860NaNFox PointMilwaukeeWI196357.5212155.5227674.5282112.5198720.0212781.0
16141615028611922-01-01MilwaukeeMilwaukeeWI30301.533112.538347.539847.543236.043119.0
1615161602862NaNMilwaukeeMilwaukeeWI56086.558680.062710.571485.573122.076455.0
1616161702863NaNMilwaukeeMilwaukeeWI0.00.00.00.00.00.0
16171618028641913-08-05MilwaukeeMilwaukeeWI53412.055384.561980.062097.063099.067599.0
16181619028651910-02-10CudahyMilwaukeeWI103951.5133564.5138643.5150294.0159280.5152766.0
1619162002866NaNWauwatosaMilwaukeeWI98406.0105657.0114579.0124258.5139989.0150336.0
1620162102868NaNMequonOzaukeeWI83460.086874.098116.5124689.0126501.0137949.0
1621162202869NaNDelafieldWaukeshaWI81405.089365.598139.093705.0120355.5122323.5
1622162302870NaNEagleWaukeshaWI25537.525537.528282.530828.035551.535727.0